Unsupervised Adaptation for Statistical Machine Translation
نویسندگان
چکیده
In this work, we tackle the problem of language and translation models domainadaptation without explicit bilingual indomain training data. In such a scenario, the only information about the domain can be induced from the source-language test corpus. We explore unsupervised adaptation, where the source-language test corpus is combined with the corresponding hypotheses generated by the translation system to perform adaptation. We compare unsupervised adaptation to supervised and pseudo supervised adaptation. Our results show that the choice of the adaptation (target) set is crucial for successful application of adaptation methods. Evaluation is conducted over the German-to-English WMT newswire translation task. The experiments show that the unsupervised adaptation method generates the best translation quality as well as generalizes well to unseen test sets.
منابع مشابه
Mixture-Modeling with Unsupervised Clusters for Domain Adaptation in Statistical Machine Translation
In Statistical Machine Translation, in-domain and out-of-domain training data are not always clearly delineated. This paper investigates how we can still use mixture-modeling techniques for domain adaptation in such cases. We apply unsupervised clustering methods to split the original training set, and then use mixture-modeling techniques to build a model adapted to a given target domain. We sh...
متن کاملA Multi-Domain Translation Model Framework for Statistical Machine Translation
While domain adaptation techniques for SMT have proven to be effective at improving translation quality, their practicality for a multi-domain environment is often limited because of the computational and human costs of developing and maintaining multiple systems adapted to different domains. We present an architecture that delays the computation of translation model features until decoding, al...
متن کاملLanguage Model Adaptation for Statistical Machine Translation with Structured Query Models
We explore unsupervised language model adaptation techniques for Statistical Machine Translation. The hypotheses from the machine translation output are converted into queries at different levels of representation power and used to extract similar sentences from very large monolingual text collection. Specific language models are then build from the retrieved data and interpolated with a genera...
متن کاملLanguage Model Adaptation for Statistical Machine Translation via Structured Query Models
We explore unsupervised language model adaptation techniques for Statistical Machine Translation. The hypotheses from the machine translation output are converted into queries at different levels of representation power and used to extract similar sentences from very large monolingual text collection. Specific language models are then build from the retrieved data and interpolated with a genera...
متن کاملStatistical Post-Editing of Machine Translation for Domain Adaptation
This paper presents a statistical approach to adapt out-of-domain machine translation systems to the medical domain through an unsupervised post-editing step. A statistical post-editing model is built on statistical machine translation (SMT) outputs aligned with their translation references. Evaluations carried out to translate medical texts from French to English show that an out-of-domain mac...
متن کامل